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Adaptive,  Model-Based  Monitoring  and  Threat  Detection 

Alfonso  Valdes 
Keith  Skinner 

SRI  International 
333  Ravenswood  Ave. 

Menlo  Park,  CA  94070 

Summary 

With  the  expanded  use  of  networked  eomputers  and  proliferation  of  high-bandwidth  connections, 
there  has  been  an  unfortunate  increase  in  computer  and  network  abuse.  Detecting  and  correlating 
incidents  of  abuse  is  an  essential  aspect  of  information  assurance.  To  date,  intrusion  detection 
systems  have  relied  on  matching  “signatures”  of  known  attacks  to  those  contained  in  a 
knowledge  base.  This  approach,  while  effective,  misses  variants  of  attacks  as  well  as  new  attacks 
that  are  not  in  the  knowledge  base.  Conversely,  anomaly  detection  approaches  have  the  potential 
for  recognizing  novel  attacks,  but  in  practice  have  been  hampered  by  low  sensitivity,  lack  of 
specificity,  and  unacceptable  false  alarm  rates.  This  research  effort  explored  probabilistic 
approaches  such  as  Bayes  systems,  which  encode  their  knowledge  base  not  as  specific 
signatures,  but  as  conditional  probability  relations.  Rather  than  relying  on  rules  for  metrics 
related  to  transaction  control  protocol  (TCP)  connections,  the  system  adaptively  learns  these  and 
also  discovers  hosts  and  services  on  the  monitored  network.  The  result  is  an  IDS  that  detects 
many  novel  attacks,  and  aided  by  its  adaptive  capability  achieves  acceptable  sensitivity  and  false 
alarm  rates. 

The  number  of  IDS  alerts  in  typical  systems  can  overwhelm  network  security  officers,  raising  the 
need  for  effective  prioritization  and  correlation  of  alert  messages.  We  address  this  need  in  two 
ways.  First,  in  cooperative  research  with  the  SRI  Mission-Based  Correlation  effort  (supported  by 
the  same  DARPA  program),  we  reused  the  Bayes  inference  library  from  the  TCP  detector  to 
implement  an  inference  engine  for  alert  ranking  and  priority.  This  capability  allows  the  security 
administrator  to  specify  preferences  in  a  configuration  file,  and  includes  an  adaptive  capability 
that  dynamically  adjusts  the  internal  knowledge  base  guided  by  administrator  decisions. 
Additionally,  we  explored  probabilistic  techniques  for  alert  correlation.  The  approach  is  a  mix  of 
Bayes  techniques  with  concepts  from  sensor  fusion.  This  correlation  component  shares  the 
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ranking  and  prioritization  capability  of  the  Mission-Based  Correlator.  The  probabilistic 
correlation  system  has  proven  effective  with  alerts  that  report  incomplete  eontent,  or  with  reports 
from  heterogeneous  sensors  that  are  at  variance  with  each  other.  The  system  demonstrates  the 
ability  to  thread  an  attaek  from  probe  to  internal  exploit  as  well  as  recognize  as  the  same  incident 
reports  from  multiple  heterogeneous  sensors.  These  capabilities  have  been  demonstrated  in  our 
live  network,  in  a  pilot  deployment  at  a  government  agency,  and  with  the  Cyberpanel  Grand 
Challenge  Problem  (GCP). 
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Adaptive,  Model-Based  Monitoring  and  Threat  Detection 


1.  Introduction  and  Overview 

The  field  of  intrusion  detection  has  considered  a  variety  of  approaches  based  on  anomaly 
detection  and  signature  or  rule-based  systems.  A  subclass  of  anomaly  detection  systems  attempts 
to  learn  behavior  in  a  probabilistic  sense.  Anomaly  detection  systems  are  attractive  in  their 
ability  to  detect  novel  unusual  activity,  while  signature  systems  lack  this  generalization  potential. 
However,  anomaly  detection  has  enjoyed  only  modest  success  in  practice,  due  to  lack  of 
specificity  and  unacceptable  false  alarm  rates.  Moreover,  many  important  attacks  do  not  manifest 
as  anomalies  in  the  features  these  systems  observe.  Signature  systems  have  enjoyed  greater 
success,  and  all  leading  commercial  and  most  research  systems  are  in  this  class.  These  systems 
incorporate  a  knowledge  base  that  can  be  as  simple  as  matching  suspicious  patterns  in  packet 
traffic  or  as  sophisticated  as  “stateful”  systems  such  as  EMERALD  [Por97]. 

Our  objective  when  undertaking  this  research  was  to  explore  a  middle  ground,  that  is,  the  class  of 
systems  that  incorporate  a  knowledge  base  in  a  probabilistic  sense,  giving  the  system  some 
generalization  potential  but  greater  sensitivity  and  specificity.  Bayes  networks  represent 
knowledge  as  conditional  probability  relations  between  observable  features  and  hypotheses  of 
use  and  misuse.  Moreover,  conditional  probabilities  are  maintained  as  internal  tables  that  can 
adaptively  learn  in  response  to  new  observations.  The  first  component  we  implemented  was  a 
Bayes  sensor  for  attacks  visible  in  TCP  header  traffic.  This  proved  to  be  effective  with  the 
Lincoln  Laboratory  1999  evaluation  data  set  [LipOO],  and  was  further  validated  through 
extensive  experimentation  with  live  traffic.  Today,  this  component  runs  live  in  our  environment 
as  well  as  at  the  National  Security  Agency  (NS A). 

One  innovative  feature  of  this  system  is  that  it  is  actually  a  coupled  sensor,  with  one 
subcomponent  that  adaptively  learns  hosts  and  services  on  the  monitored  network,  and  another 
that  uses  this  information  to  adjust  its  state  dynamically.  The  result  is  improved  sensitivity  with  a 
reduced  false  alarm  rate,  particularly  for  false  alarms  that  are  side  effects  of  an  attack  or  a 
nonmalicious  failure  (what  we  term  “collateral  damage”).  This  host  availability  monitor  is  useful 
in  its  own  right  in  discovering  new  and  possibly  unauthorized  services  as  well  as  notifying  the 
administrator  of  nonmalicious  failures. 
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Besides  sensor  eoupling,  whieh  ean  be  eonsidered  a  form  of  sensor  state  eorrelation,  we  have 
explored  three  aspeets  of  alert  eorrelation.  In  alert  threading  (within-sensor  eorrelation),  the 
sensor  maintains  a  eoneept  of  session,  and  issues  alerts  for  suspieious  sessions  that  eonsolidate 
many  low-level  events.  For  many  attaeks  sueh  as  address  sweeps  the  reduetion  in  the  number  of 
alert  messages  ean  be  two  orders  of  magnitude.  Threading  is  a  eoneept  eommon  to  most 
EMERALD  sensors,  but  is  absent  from  many  other  systems.  In  ineident  eorrelation,  multiple 
reports  from  different  and  possibly  heterogeneous  sensors  are  reeognized  as  deseribing  the  same 
ineident.  Seenario  eorrelation  ehains  together  multiple  attaek  steps  (eaeh  a  thread  or  ineident)  to 
reassemble  more  eomplex  attaeks.  We  have  implemented  a  eorrelation  engine  based  on 
probabilistie  inferenee  to  aoeomplish  these  goals.  The  system  adapts  eoneepts  from 
heterogeneous  sensor  fusion  and  a  transition  model  for  multistep  attaeks.  The  probabilistie 
approaeh  is  robust  against  ineomplete  or  eonflieting  information  from  multiple  sensors,  whieh 
will  represent  the  state  of  affairs  as  standards  sueh  as  IDMEE  [CuOl]  are  adopted  in  varying 
degrees.  This  system  shares  with  the  EMERALD  Mission-Based  Correlation  a  subsystem  to 
prioritize  and  rank  alerts.  This  subsystem  in  turn  is  based  on  the  same  Bayes  inferenee  eode  that 
underlies  the  Bayes  TCP  sensor,  and  adaptively  learns  the  seeurity  administrator’s  preferenees 
for  alert  ranking  and  prioritization. 

The  remainder  of  this  report  is  organized  as  follows.  By  way  of  baekground,  we  provide  relevant 
material  deseribing  Bayesian  inferenee,  ineluding  the  representation  of  a  knowledge  base  as 
eonditional  probability  relations.  We  then  deseribe  the  eomponents  developed  in  our  work, 
namely,  the  Bayes  TCP  sensor  and  the  probabilistie  eorrelation  module.  In  the  deseription  of  the 
latter,  we  give  an  overview  of  the  prioritization  and  ranking  module  shared  with  the  Mission- 
Based  Correlator.  We  give  results  of  experimentation  and  use  of  these  eomponents  in  attaek 
simulations  as  well  as  live  traffie  analysis.  We  then  give  eonelusions  and  suggestions  for  further 
work. 


2.  Background 

Intrusion  deteetion  to  date  has  eonsidered  system  audit  trails  [Val94,  LinOO]  and  monitored 
network  traffie  [SNORT,  ISS,  Por97].  Most  of  these  systems  use  rule-based  inferenee  ranging 
from  simple  pattern  matehing  [SNORT,  ISS]  to  systems  that  maintain  some  notion  of  session 
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state  [Por97,  LinOO].  A  minority  employ  some  form  of  anomaly  detection,  including  variants 
based  on  nonparametric  statistics  [Val94,  Ski98],  sequence  analysis  [For96],  and  data  mining 
[LeeOO].  To  date,  the  rule-based  systems  have  dominated  the  field,  comprising  all  major 
commercial  intrusion  detection  systems  (IDS)  and  most  research  efforts.  Moreover,  critics  of 
anomaly  detection  correctly  point  out  that  intrusions  are  not  necessarily  anomalous,  and 
anomalies  are  not  necessarily  intrusive  [McHOl].  This  observation  raises  an  objection  to 
anomaly  detection  in  principle;  additionally,  in  practice  these  systems  have  not  shown  acceptable 
sensitivity  and  sufficiently  low  false  alert  rates  to  gain  wide  acceptance.  On  the  other  hand, 
critics  of  rule-based  systems  point  out  that  such  systems  may  not  be  capable  of  detecting  novel 
attacks.  Our  research  explores  systems  where  models  of  malicious  use  are  not  expressed  as 
specific  signatures,  which  are  bypassable  by  varying  the  attack  slightly,  but  are  encoded  in  a 
probabilistic  sense.  Bayes  networks  [Pearl88]  are  particularly  suitable  to  this  representation, 
relating  hypotheses  to  observable  evidence  by  means  of  conditional  probability  relations.  Our 
system  adapts  as  its  view  changes  (mathematically,  by  changing  prior  belief  among  competing 
hypotheses  or  modifying  conditional  probability  relations  appropriately).  Our  goal  was  to 
develop  a  system  with  specificity,  sensitivity,  and  false  alarm  rate  comparable  to  the  better  rule- 
based  systems,  but  retaining  some  of  the  potential  to  detect  novel  attacks  of  anomaly  detection. 
To  this  end,  we  developed  a  TCP  session  monitor  capable  of  detecting  attacks  visible  in  TCP 
header  data,  adaptively  learning  system  resources  and  parameters  such  as  typical  connection 
completion  times.  This  system  is  probabilistic  (inference  is  based  on  Bayes  probability),  adaptive 
(some  parameters  are  learned  by  the  system  as  it  observes  the  monitored  network),  and  model 
based  (important  classes  of  misuse  are  encoded  as  conditional  probability  models).  It  includes  a 
capability  based  on  more  traditional  anomaly  detection  that  causes  alerts  based  on  extremely 
unusual  patterns  of  TCP  port  use,  but  the  core  detection  capability  is  Bayesian  and  model  based 
[ValOO]. 

We  were  also  interested  in  the  problem  of  IDS  alert  correlation,  which  was  largely  unexplored 
when  our  effort  began,  although  there  have  been  some  contributions  in  the  interim  [DeOl, 
HoaOl].  As  in  the  area  of  intrusion  detection,  most  correlation  approaches  to  date  are  based  on 
rules  and  heuristics.  As  with  the  intrusion  detection  component,  we  explored  a  probabilistic 
approach  to  intrusion  alert  correlation  as  well.  The  Bayes  paradigm  of  maintaining  a  prior  belief 
over  a  number  of  hypotheses  and  updating  this  belief  as  new  evidence  is  observed  is  used  to 
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model  attack  evolution  over  time.  To  assess  whether  a  newly  observed  alert  is  plausibly 
connected  to  an  existing  set  of  correlated  alerts,  we  define  a  number  of  features  in  the  alert  and 
employ  an  approach  somewhat  analogous  to  that  used  in  multisensor  data  fusion  [Hall92].  While 
concepts  from  traditional  multisensor  fusion  provide  useful  guidance,  definitions  of  such 
concepts  as  feature  similarity  are  specific  to  the  intrusion  correlation  domain.  We  were  able  to 
develop  a  successful  prototype  system  capable  of  correlating  alerts  from  heterogeneous  sensors, 
even  when  the  sensors  disagreed  as  to  specifics  of  the  alert  or  the  alert  messages  were  improperly 
formed.  As  standards  for  alert  interchange  are  still  fairly  immature,  we  feel  that  the  inherent 
robustness  of  probabilistic  systems  endows  them  with  an  important  advantage  [ValOl]. 

3,  Methods,  Approaches,  and  Procedures 
Bayes  TCP  Sensor 

Foundations 

Mathematically,  we  have  adapted  the  framework  for  belief  propagation  in  causal  trees  from  Pearl 
[Pearl88].  Knowledge  is  represented  as  nodes  in  a  tree,  where  each  node  is  considered  to  be  in 
one  of  several  discrete  states.  A  node  receives  ;r (prior,  or  causal  support)  messages  from  its 
parent,  and  /I  (likelihood,  or  diagnostic  support)  messages  from  its  children  as  events  are 
observed.  We  think  of  priors  as  propagating  downward  through  the  tree,  and  likelihood  as 
propagating  upward.  These  are  discrete  distributions,  that  is,  they  are  positive  valued  and  sum  to 
unity.  The  prior  message  incorporates  all  information  not  observed  at  the  node.  The  likelihood  at 
terminal  or  “leaf’  nodes  corresponds  to  the  directly  observable  evidence.  A  conditional 
probability  table  (CPT)  links  a  child  to  a  parent.  Its  elements  are  given  by 

CPTij  =  P  (state  =  j\parent  _state  = 

As  a  consequence  of  this  definition,  each  row  of  a  CPT  is  a  discrete  distribution  over  the  node 
states  for  a  particular  parent  node  state,  that  is, 

CPT^>0,Vi,j, 

yCF7i,.  =  l,V; 

j 

The  basic  operations  of  message  propagation  in  the  tree  are  most  succinctly  expressed  in  terms 
of  vector/matrix  algebra.  We  will  adopt  the  convention  that  prior  messages  are  represented  as 
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row  vectors.  Downward  propagation  of  the  prior  messages  is  achieved  by  left  multiplication  of 
the  parent’s  prior  by  the  CPT,  that  is, 

n{node)  =  aT^parent _node)*  CPT 

where  a  is  a  normalizing  constant  to  ensure  that  the  result  sums  to  unity.  Note  that  since  CPT  is 
not  required  to  be  square,  the  number  of  elements  in  7r{node)  and  /r(parent _node)  may  be 
different.  Since  we  limit  ourselves  to  trees,  there  is  at  most  one  parent  per  node.  However,  there 
may  be  multiple  children,  so  upward  propagation  of  the  likelihood  messages  requires  a  fusion 
step.  For  each  node,  the  T  message,  represented  as  a  column  vector,  is  propagated  upward  via  the 
following  matrix  computation: 

X _to _ par enlinode)  =  CPT  •  X{node) 

Note  that  A{node)  has  number  of  elements  equal  to  the  number  of  states 
X  _to  _parent{node)  has  number  of  elements  equal  to  the  number  of  states 
These  messages  are  fused  at  the  parent  via  elementwise  multiplication: 

X^  (parent)  =  Lj  (parent)!  TUp  arent) 

Here,  L  represents  the  raw  elementwise  product,  and  X  is  obtained  by  normalizing  this  to  unit 
sum.  Finally,  the  belief  over  the  states  at  a  node  is  obtained  as  follows: 

BEL,=(dn,X, 

where  P  is  a  normalizing  constant  so  that  BEL  has  unit  sum.  Figure  1  illustrates  propagation  in  a 
fragment  of  a  tree. 


in  the  node,  while 
in  the  parent  node. 
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Figure  1:  Message  Propagation  in  a  Tree  Fragment 

Adaptive  CPT  Adjustment 

The  system  is  preeonfigured  with  CPTs  relating  observable  features  to  normal  and  misuse 
hypotheses.  These  CPTs  ean  adaptively  evolve  to  adjust  to  speeifie  environments.  Adaptation  via 
reinforeement  proeeeds  as  follows.  We  reeall  that  the  CPT  relates  a  ehild  node  to  its  parent.  In 
our  representation,  the  rows  of  the  CPT  eorrespond  to  parent  states,  while  the  eolumns 
eorrespond  to  ehild  states.  If  a  single  hypothesis  is  dominant  at  the  root  node,  we  adapt  the 
eorresponding  row  of  the  CPT  matrix  at  eaeh  ehild  slightly  in  the  direetion  of  the  A,  message  at 
the  ehild  node  for  the  present  observation.  Speeifioally,  if  hypothesis  i  “wins”  at  the  root  node, 
we  adjust  CPT  as  follows.  First,  we  deeay  the  internal  effeetive  eounts  via  a  deeay  funetion: 

counts =  ycountSi  +  (l  -  y) 

The  deeayed  eount  is  used  as  a  “past  weight”  for  the  adjustment,  and  is  the  effeetive  number  of 
times  this  hypothesis  has  been  reeently  observed.  The  CPT  row  is  first  eonverted  to  effeetive 
eounts  for  eaeh  ehild  state,  and  the  present  observation  is  added  as  an  additional  eount  distributed 
over  the  same  states.  Then  the  row  elements  are  divided  by  the  row  sum  so  that  the  adjusted  row 
has  unit  sum.  This  is  aeeomplished  by  the  following  equation: 

countSj  X 

'^countSi  X  CPTjy  +  Zj 
j 


=  ■ 
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Finally,  the  internal  eounts  are  reeomputed  for  all  parent  states: 


counts  j  =countSj 


decay 


Jy,  hypothesis  i  is  the  winner 
[  0,  otherwise 


By  this  proeedure,  the  effeetive  eount  never  deeays  below  1.0  (if  the  hypothesis  is  never 
observed)  and  never  grows  beyond  if  the  hypothesis  is  always  observed.  We  typieally 

ehoose  the  deeay  faetor  so  that  the  effeetive  eount  grows  to  between  200  and  1000  observations. 
Observations  for  frequently  seen  hypotheses  have  a  smaller  CPT  adjustment  than  do 
observations  for  rare  hypotheses.  In  addition,  sinee  only  “winning”  hypotheses  eause  a  potential 
CPT  adjustment,  our  system  has  one  key  advantage  over  other  statistieal  ID  systems.  A  large 
number  of  observations  for  a  hypothesis  eorresponding  to  an  attaek  will  not  be  eonsidered 
“normal”  no  matter  how  frequently  it  is  observed,  as  its  adjustment  only  reinforees  the 
eorresponding  internal  attaek  hypothesis  model  in  the  system. 

State  Transition 

As  a  simplifying  assumption,  the  states  observed  for  the  respeetive  variables  are  eonsidered  to  be 
independent  of  what  was  observed  for  these  variables  in  past  inferenee  intervals,  given  the 
session  elass.  In  addition,  given  the  value  of  the  session  elass  in  the  eurrent  interval,  A  is 
independent  of  any  other  observable  variable  Y.  In  other  words,  for  all  observable  variables  A,  Y 
and  inferenee  intervals  0  to  k,  we  have 

P(Xj^  =  x\Sess  _classj^  =  s,Xj^_i...Xq,Yj^_i...Yq^=  p{Xj^  =  x\Sess  _classj^  = 

The  evolution  of  session  elass  over  inferenee  intervals  is  modeled  as  a  diserete  time-and-state 
Markov  proeess.  The  transition  matrix  is  a  eonvex  eombination  of  an  identity  matrix  (to  express 
state  persistenee)  and  a  matrix  whose  rows  are  all  equal  to  some  prior  distribution  over  the 
possible  values  of  session  elass  (to  express  the  tendeney  of  the  proeess  to  deeay  to  some  prior 
state).  In  other  words,  for  someO  <  y  <  I,  the  transition  matrix  M  is  given  by 

M  =  yi  +  {\-y)P 

where  I  is  an  identity  matrix  and  eaeh  row  of  P  is  given  by 
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P,  =  PRIOR 

and  PRIOR  is  a  prior  distribution  over  possible  values  j  for  session  class,  that  is, 

PRIOR^-  =  Prior  probability  {Sess _class  =  j) 

My  is  the  probability  that  if  the  process  is  currently  in  state  i  it  will  be  in  state  j  at  the  next 

event.  More  generally,  if  POST_BEL  is  our  current  belief  state  (a  distribution  over  the  possible 
state  values,  given  the  evidence  up  to  and  including  this  time  interval),  left  multiplication  with  M 
redistributes  our  belief  to  obtain  the  prior  belief  before  the  next  observation; 

PRE  BEL^  =  POST  BEL^.jM 

We  manipulate  the  parameter  y  to  capture,  albeit  imperfectly,  the  continuous  nature  of  the 
underlying  process.  We  typically  invoke  the  inference  function  every  100  events  within  a 
session,  and  always  when  the  session  enters  the  idle  state.  Some  sessions  are  less  than  100  events 
in  total,  while  others,  particularly  many  denial-of-service  (DOS)  attacks,  consist  of  tens  of 
thousands  of  events  in  a  very  short  time  interval.  In  the  latter  case,  even  though  many  inference 
steps  are  invoked,  we  prefer  to  have  a  moderately  high  persistence  parameter  (about  0.75) 
because  very  little  time  has  elapsed.  If  the  parameter  is  0,  the  belief  reverts  to  the  prior  at  each 
event. 

It  can  be  shown  that,  unless  y  is  unity,  iteratively  multiplying  M  by  itself  results  in  a  matrix  that 
approaches  P,  that  is, 

lim„^^  M"  =  P 

In  practice,  this  limit  is  nearly  reached  for  fairly  small  values  of  n.  The  result  of  this  observation 
is  attractive  from  the  intuitive  standpoint:  in  the  absence  of  reinforcing  evidence  from  subsequent 
events,  the  belief  distribution  tends  to  revert  to  the  prior. 

The  inference  operation  at  interval  k  begins  by  setting  the  Bayes  n  message  to  PRE_BEL ^ . 
Then  the  observables  over  the  interval  are  presented  to  the  leaf  nodes,  and  the  belief  state  at  the 
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root  node  is  extracted.  If  this  is  deemed  sufficiently  suspicious,  the  system  generates  an  alert 
message  that  can  be  displayed  at  a  console  or  forwarded  to  a  correlation  utility. 

Probabilistic  Correlation 

Our  probabilistic  correlation  approach  considers  feature  overlap,  feature  similarity,  minimum 
similarity,  and  expectation  of  similarity.  In  this  context,  a  “feature”  is  the  value  of  a  field  in  the 
alert  that  is  pertinent  to  alert  correlation.  Features  include  source  and  target  network  addresses 
and  ports,  the  type  of  attack,  and  the  attack  time.  We  maintain  a  list  of  “meta  alerts”  that  are 
possibly  composed  of  several  alerts,  potentially  from  heterogeneous  sensors.  For  two  alerts 
(typically  a  new  alert  and  a  meta  alert),  we  begin  by  identifying  features  they  have  in  common 
(feature  overlap).  Such  features  include  the  source  of  the  attack,  the  target  (hosts  and  ports),  the 
class  of  the  attack,  and  time  information.  With  each  feature,  we  have  a  similarity  function  that 
returns  a  number  between  0  and  1,  with  1  corresponding  to  a  perfect  match.  Similarity  is  a 
feature-specific  function  that  considers  such  issues  as 

•  How  well  do  two  lists  overlap  (for  example,  list  of  targeted  ports)? 

•  Is  one  observed  value  contained  in  the  other  (for  example,  is  the  target  port  of  a  DOS  attack 
one  of  the  ports  that  was  the  target  of  a  recent  probe)? 

•  If  two  source  addresses  are  different,  are  they  likely  to  be  from  the  same  subnet? 

For  attack  class  similarity,  we  maintain  a  matrix  of  similarity  between  attack  classes,  with  values 
of  unity  along  the  diagonal  and  off-diagonal  values  that  heuristically  express  similarity  between 
the  corresponding  attack  classes.  We  prefer  to  consider  attack  classes  rather  than  attack 
signatures,  which  are  much  more  specific  and  numerous  but  may  be  erroneously  or  incompletely 
reported.  For  example,  in  our  demonstration  environment,  we  run  a  variant  of  mscan  that  probes 
certain  sensitive  ports,  that  is,  it  is  of  the  attack  class  “portsweep”.  Our  host  sensors  have  a 
specific  signature  for  this  attack  and  call  it  “mscan”.  The  Bayes  sensor  trades  specificity  for 
generalization  capability  and  has  no  “mscan”  model,  but  successfully  detects  this  attack  as  a 
“portsweep”.  These  reports  are  considered  similar  (S  =  1)  with  respect  to  attack  class. 

Not  all  sensors  produce  all  possible  identifying  features.  For  example,  a  host  sensor  provides 
process  identifier,  while  a  network  sensor  does  not.  Features  not  common  to  both  alerts  are  not 
considered  for  the  overall  similarity  match. 
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The  meta  alert  itself  supports  the  threading  eoneept,  so  we  ean  visualize  eomposing  meta  alerts 
from  meta  alerts. 

Similarity  Expeetation  and  Minimum  Similarity 

An  important  innovation  we  introduee  is  expeetation  of  similarity.  As  with  similarity,  this  is  also 
between  0  and  1,  and  expresses  our  prior  expeetations  that  the  feature  should  mateh  if  the  two 
alerts  are  related,  eonsidering  the  speeifies  of  eaeh.  For  example,  two  probes  from  the  same 
target  might  sean  the  same  set  of  ports  on  different  parts  of  our  subnet  (so  expeetation  of 
matehing  target  IP  address  is  low).  Also,  some  attaeks  sueh  as  SYN  FLOOD  spoof  the  souree 
address,  so  we  would  allow  a  mateh  with  an  earlier  probe  of  the  same  target  even  if  the  source 
does  not  match  (expectation  of  match  for  source  IP  is  low). 

We  now  give  some  examples  of  how  expectation  of  similarity  depends  on  the  situation,  that  is, 
the  features  in  the  meta  alert  and  the  new  alert. 

If  an  alert  from  a  sensor  has  a  thread  identifier  that  matches  the  list  of  sensor/thread  identifiers 
for  some  meta  alert,  the  alert  is  considered  a  match  and  fusion  is  done  immediately.  In  other 
words,  the  individual  sensor’s  determination  that  an  alert  is  an  update  of  or  otherwise  related  to 
one  of  its  own  alerts  overrides  other  considerations  of  alert  similarity. 

If  the  meta  alert  has  received  reports  from  host  sensors  on  different  hosts,  we  do  not  expect  the 
target  host  feature  to  match.  If  at  least  one  report  from  a  network  sensor  has  contributed  to  the 
meta  alert  and  a  host  sensor  alert  is  received,  the  expectation  of  similarity  is  that  the  target 
address  of  the  latter  is  contained  in  the  target  list  of  the  former. 

In  determining  whether  an  exploit  can  be  plausibly  considered  the  next  stage  of  an  attack  for 
which  a  probe  was  observed,  we  expect  the  target  of  the  exploit  (the  features  host  and  port)  to  be 
contained  in  the  target  host  and  port  list  of  the  meta  alert. 

Some  sensors,  particularly  those  that  maintain  a  degree  of  state,  report  start  and  end  times  for  an 
attack,  while  others  can  only  timestamp  a  given  alert.  The  former  deal  with  time  intervals,  while 
the  latter  do  not.  Similarity  in  time  comprehends  overlap  of  the  time  intervals  in  the  alerts 
considered  for  correlation,  as  well  as  the  notion  of  precedence.  We  do  not  penalize  time 
similarity  too  far  from  unity  if  the  time  difference  is  plausibly  due  to  clock  drift. 
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Deciding  whether  the  attacker  is  similar  is  somewhat  more  involved.  In  the  case  of  an  exact 
match  of  originating  IP  address,  similarity  is  perfect.  We  assign  high  similarity  if  the  subnet 
appears  to  match.  In  this  way,  a  meta  alert  may  potentially  contain  a  list  of  attacker  addresses.  At 
this  point,  we  consider  similarity  based  on  containment.  In  addition,  if  an  attacker  compromises  a 
host  within  our  network  (as  inferred  by  a  successful  outcome  for  an  attack  of  the  root 
compromise  class),  that  host  is  added  to  the  list  of  attacker  hosts  for  the  meta  alert  in  question. 
Finally,  for  attack  classes  where  the  attacker’s  address  is  likely  to  be  spoofed  (for  example,  the 
Neptune  attack),  similarity  expectation  with  respect  to  attacker  address  is  assigned  a  low  value. 

Our  correlation  component  implements  not  just  expectation  of  similarity  (which  effectively  acts 
as  a  weight  vector  on  the  features  used  for  similarity  matching)  but  also  enforces  situation- 
specific  minimum  similarity.  Certain  features  can  be  required  to  match  exactly  (minimum 
similarity  for  these  is  unity)  or  approximately  (minimum  similarity  is  less  than  unity,  but  strictly 
positive)  for  an  alert  to  be  considered  as  a  candidate  for  fusion  with  another.  Minimum 
expectation  thus  expresses  necessary  but  not  sufficient  conditions  for  correlation. 

The  overall  similarity  between  two  alerts  is  zero  if  any  overlapping  feature  matches  at  a  value 
less  than  the  minimum  similarity  for  the  feature  (features  for  which  no  minimum  similarity  is 
specified  are  treated  as  having  a  minimum  similarity  of  0).  Otherwise,  overall  similarity  is  the 
weighted  average  of  the  similarities  of  the  overlapping  features,  using  the  respective  expectations 
of  similarity  as  weights. 

Correlation  Modes 

By  appropriate  settings  of  similarity  expectation  and  minimum  similarity,  the  correlation 
component  achieves  the  following  hierarchy  of  correlation.  The  system  is  composable  in  that  we 
can  deploy  multiple  instances  to  obtain  correlation  at  different  stages  in  the  hierarchy.  For 
example,  we  can  infer  threads  (within  sensor  correlation)  and  then  correlate  threaded  alerts  from 
heterogeneous  sensors  into  security  incidents. 

Synthetic  Threads:  For  sensors  that  do  not  employ  the  thread  concept,  the  correlation 
synthesizes  threads  by  enforcing  high  minimum  expectation  similarity  on  the  sensor  itself  (the 
thread  must  come  from  a  single  sensor)  and  the  attack  class,  as  well  as  source  and  target  (IP  and 
ports).  We  have  wrapped  the  alert  messages  from  a  leading  commercial  sensor  and  observed  that 
this  facility  reliably  reconstructs  threads. 
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In  particular,  by  placing  an  aggregator  component  topologieally  elose  to  an  IDS,  the  pair  is  made 
robust  against  attaeks  that  eause  the  IDS  itself  to  flood,  as  deseribed  in  a  reeent  NIPC  advisory 
[NIPCOl], 

Security  Incidents;  By  suppressing  minimum  expeetation  of  similarity  on  the  sensor  identifier, 
and  relaxing  expectation  of  similarity  for  this  feature,  we  ean  fuse  reports  of  the  same  ineident 
from  several  heterogeneous  sensors  into  a  single  ineident  report.  In  this  case,  we  enforee  a 
moderately  high  expeetation  of  similarity  on  the  attaek  elass.  This  is  not  unity  beeause  different 
sensors  may  report  a  different  attaek  elass  for  the  same  attaek.  We  eonstruet  a  table  of  distanees 
between  attaek  elasses  that  expresses  whieh  ones  are  aeeeptably  elose.  For  seeurity  ineident 
eorrelation,  we  enforee  minimum  expeetations  on  the  souree  and  target  of  the  attaek.  Using  this 
teehnique,  we  have  been  able  to  fuse  alert  reports  from  eommereial  and  EMERALD  sensors  into 
seeurity  ineident  reports. 

Correlated  Attack  Reports;  By  relaxing  the  minimum  expeetation  of  similarity  on  the  attaek 
elass,  we  are  able  to  reeonstruet  various  steps  in  a  multistage  attaek.  Eaeh  stage  in  an  attaek  may 
itself  be  a  eorrelated  seeurity  ineident  as  deseribed  above.  In  this  fashion,  it  is  possible  to 
reeognize  a  staged  attaek  eomposed  of,  for  example,  a  probe  followed  by  an  exploit  to  gain 
aeeess  to  an  internal  maehine,  and  then  using  that  maehine  to  launeh  an  attaek  against  a  more 
eritieal  asset. 

Eeature  Eusion 

When  the  system  deeides  to  fuse  two  alerts,  based  on  aggregate  similarity  aeross  eommon 
features,  the  fused  feature  set  is  a  superset  of  the  features  of  the  two  alerts.  Eeature  values  in 
fused  alerts  are  typieally  lists,  so  alert  fusion  involves  list  merging.  Eor  example,  suppose  a  probe 
of  eertain  ports  on  some  range  of  the  proteeted  network  matehes  in  terms  of  the  port  list  with  an 
existing  probe  that  originated  from  the  same  attaeker  subnet,  but  the  target  hosts  in  the  prior  alert 
were  to  a  different  range  of  our  network.  The  attaeker  address  list  has  the  new  attaeker  address 
appended,  and  the  lists  of  target  hosts  are  merged.  The  port  list  matehes  and  is  thus  unehanged. 

Two  important  features  are  the  sensor  and  thread  identifiers  of  all  the  eomponent  alerts,  so  that 
the  operator  is  always  able  to  examine  in  detail  the  alerts  that  eontribute  to  the  meta  alert  report. 

One  additional  feature  is  the  priority  of  the  meta  alert,  supported  by  our  template  and  provided 
by  EMERALD  sensors.  We  are  developing  a  eomponent  that  estimates  eritieality  based  on  the 
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assets  affected,  the  type  of  attack,  the  likelihood  of  attack  success,  and  an  administrative 
preference.  The  aggregator  maintains  the  high-water  mark  for  this  field.  We  are  investigating 
approaches  whereby  the  contributing  threads  are  permitted  to  update  their  priority  downward, 
computing  meta  alert  priority  as  the  maximum  across  thread  priorities  at  any  given  time.  This 
approach  would  permit  downward  revision  of  the  meta  alert  priority. 

The  features  presently  considered  in  the  probabilistic  correlator  component  include  sensor 
identification  (identifier,  location,  name),  alert  thread,  incident  class,  source  and  target  IP  lists, 
target  TCP/UDP  port  lists,  source  user  id,  target  user  id,  and  time.  Computations  are  only  over 
features  that  overlap  in  the  alert  to  be  merged  and  the  candidate  meta  alert  into  which  it  is  to  be 
merged.  Incident  signature  is  used  as  well,  but  with  a  low  expectation  of  similarity  as  these  vary 
widely  across  heterogeneous  sensors. 

If  present,  a  thread  identifier  from  the  reporting  sensor  overrides  other  match  criteria.  A  new 
alert  that  matches  the  sensor  and  thread  of  an  existing  meta  alert  is  considered  an  update  of  the 
earlier  alert. 

The  correlator  first  tries  to  infer  a  thread  by  looking  for  an  exact  match  in  sensor  identification 
and  incident  class  and  signature.  Note  that  alerts  that  are  inferred  to  be  from  the  same  thread  may 
be  separated  in  time.  The  system  attempts  to  infer  threads  even  in  incident  and  scenario 
operational  modes. 

Next  the  system  checks  that  all  overlapping  features  match  at  least  at  their  minimum  similarity 
value.  Setting  minimum  expectation  for  some  features  to  unity  (not  normally  recommended) 
causes  the  system  to  behave  like  a  heuristic  system  that  requires  exact  matches  on  these  features. 
Given  that  this  criterion  passes,  we  compute  the  overall  similarity  between  the  two  alerts  as 
follows: 
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SIM{X,  Y) 


Ze,sim{XjJj) 

J 

X  =  Candidate  meta  alert  for  matehing 
7  =  New  alert 

j  =  Index  over  the  alert  features 

Ej  =  Expeetation  of  similarity  for  feature  j 

Xj,  Yj  =  Values  for  feature  j  in  alerts  X  and  Y,  respectively  (may  be  list  valued) 

Incident  class  similarity  is  based  on  a  notion  of  proximity,  which  at  present  is  the  result  of  our 
judgment.  The  proximity  of  class  A  to  B  reflects  how  reasonably  an  attack  currently  of  incident 
class  A  may  progress  to  class  B.  Note  that  this  is  not  symmetric;  we  more  strongly  expect  an 
exploit  to  follow  a  probe  than  the  other  way  around.  The  incident  classes  shown  in  Table  1  are 
from  the  EMERALD  602  message  format.  Note  that  some  “default”  classes  such  as  “invalid” 
and  “action  logged”  are  reasonably  proximal  to  most  other  classes.  This  occurs  because  the  lETE 
standard  does  not  require  a  common  ontology,  and  reports  from  heterogeneous  sensors  for  the 
same  incident  may  not  reliably  represent  this  field.  As  such,  we  do  not  want  to  reject  potential 
matches  based  on  this  field  alone. 

Eor  operational  modes  other  than  thread  level  aggregation,  we  do  not  recommend  a  high 
minimum  similarity  value  for  this  field. 
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PRIVILEGE  VIOLATION 

0.3 

1 

0.6 

0.3 

0.6 

0.6 

0.6 

0.6 

0.4 

0.3 

0.4 

0.1 

0.5 

0.6 

USER  SUBVERSION 

0.3 

0.6 

1 

0.3 

0.6 

0.5 

0.5 

0.4 

0.6 

0.3 

0.4 

0.1 

0.5 

0.6 

DENIAL  OF  SERVICE 

0.3 

0.3 

0.3 

1 

0.6 

0.3 

0.3 

0.4 

0.3 

0.5 

0.4 

0.1 

0.5 

0.6 

PROBE 

0.3 

0.2 

0.2 

0.3 

1 

0.7 

0.3 

0.3 

0.3 

0.3 

0.4 

0.8 

0.3 

0.6 

ACCESS  VIOLATION 

0.3 

0.6 

0.3 

0.5 

0.6 

1 

0.6 

0.6 

0.3 

0.3 

0.4 

0.1 

0.5 

0.6 

INTEGRITY  VIOLATION 

0.3 

0.5 

0.3 

0.5 

0.6 

0.8 

1 

0.6 

0.5 

0.3 

0.4 

0.1 

0.5 

0.6 

SYSTEM  ENV  CORRUPTION 

0.3 

0.5 

0.3 

0.5 

0.6 

0.6 

0.6 

1 

0.6 

0.3 

0.4 

0.1 

0.5 

0.6 

USER  ENV  CORRUPTION 

0.3 

0.5 

0.5 

0.3 

0.6 

0.6 

0.6 

0.6 

1 

0.3 

0.4 

0.1 

0.5 

0.6 

ASSET  DISTRESS 

0.3 

0.3 

0.3 

0.6 

0.3 

0.3 

0.3 

0.3 

0.3 

1 

0.4 

0.4 

0.3 

0.6 

SUSPICIOUS  USAGE 

0.3 

0.3 

0.5 

0.3 

0.5 

0.6 

0.5 

0.6 

0.5 

0.3 

1 

0.1 

0.3 

0.6 

CONNECTION  VIOLATION 

0.3 

0.1 

0.1 

0.3 

0.8 

0.3 

0.3 

0.3 

0.3 

0.5 

0.4 

1 

0.3 

0.6 

BINARY  SUBVERSION 

0.3 

0.3 

0.3 

0.3 

0.3 

0.6 

0.6 

0.6 

0.5 

0.3 

0.4 

0.1 

1 

0.6 

ACTION_LOGGED 

0.3 

0.3 

0.3 

0.3 

0.6 

0.5 

0.3 

0.3 

0.3 

0.3 

0.4 

0.3 

0.3 

1 

Table  1:  Incident  Class  Similarity  Matrix 


For  two  alerts  that  are  extremely  elose  in  time,  it  is  possible  that  the  alerts  may  not  be  in  time 
order.  In  this  ease,  ineident  class  similarity  is  the  greater  of  SIM(X,  Y)  and  SIM  Y,  X). 
Mathematically,  the  similarity  computation  for  incident  class  can  comprehend  a  discrete  call  (the 
alert  is  from  one  of  the  above  classes)  or  a  call  that  is  a  probability  distribution  over  the  above 
classes  (as  might  result  from  a  meta  alert  in  which  the  contributing  sensors  do  not  agree  on  the 
class). 

Most  other  features  are  potentially  list  valued.  For  lists,  the  notion  of  similarity  generally 
expresses  the  fraction  of  the  smaller  list  that  is  contained  in  the  larger.  For  source  IP  addresses, 
similarity  also  attempts  to  express  the  notion  that  the  addresses  in  question  may  come  from  the 
same  subnet. 
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Time  similarity  is  a  step  funetion  that  drops  to  0.5  after  one  hour.  A  elose  mateh  in  time  is 
expected  only  when  the  system  operates  in  “incident”  mode.  Thread  and  scenario  aggregation 
may  be  over  time  intervals  of  days. 

Correlated  Alert  Prioritization 

In  addition  to  rationally  aggregating  related  alerts,  a  correlation  system  should  also  assign 
priorities  to  the  correlated  alert.  Given  a  network  topology  and  a  preference  for  ranking  different 
classes  of  attacks  as  more  or  less  critical,  a  security  expert  can  accomplish  this  priority 
assignment,  but  the  process  is  manual  and  time  consuming.  We  developed  a  Bayes  system 
(reusing  the  inference  library  from  the  Bayes  TCP  sensor  previously  described)  to  duplicate  the 
priorities  that  a  security  expert  would  assign  to  a  given  set  of  alerts.  This  system  enables  the 
following  functions: 

•  Ability  to  weight  the  priority  ranking  along  several  attribute  groupings,  such  as  attack  type  or 
criticality  of  assets  affected. 

•  Compact  representation  of  the  influence  of  the  value  of  an  attribute  on  the  priority  assigned. 

•  Incorporation  of  the  administrator’s  preference  profile  as  to  the  relative  importance  of 
observed  values  (such  as  attack  type). 

•  Ranking  influenced  only  by  those  attributes  specified  on  a  given  alert  —  in  general,  a  given 
alert  may  not  observe  all  possible  attributes. 

•  Ability  to  update  the  ranking  based  on  observation  of  a  new  attribute. 

•  Extensibility  of  the  model  to  comprehend  attributes  that  may  be  defined  in  the  future,  with 
minimal  perturbation  to  the  rest  of  the  model. 

Computationally,  our  approach  is  to  design  a  Bayes  classifier  whose  output  is  a  ranking  value 
and  whose  observable  evidence  consists  of  the  attribute  values.  The  influence  of  an  attribute  on 
the  output  is  expressed  in  terms  of  conditional  probability  relations. 

Bayes  approaches  and  probabilistic  formalisms  in  general  represent  a  minority  of  methodologies 
employed  to  date  by  intrusion  detection  systems  as  well  as  evolving  systems  for  correlating  and 
prioritizing  alerts  from  such  systems.  Theoretically,  a  probabilistic  system  needs  to  specify  the 
entire  joint  probability  distribution  of  observable  attributes  and  corresponding  priority  ranking. 
This  is  extremely  difficult  because  of  the  “curse  of  dimensionality.”  Instead,  the  Bayes  approach 
is  to  assume  that  dependencies  between  attributes  are  local,  so  a  much  more  compact 
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representation  of  the  system’s  knowledge  base  (local  conditional  probability  relations)  is 
possible.  The  compactness  of  knowledge  representation  and  the  adaptive  potential  make  this 
approach  attractive  relative  to  signature  systems. 

4.  Alternative  Approaches  and  Evaluation 

Another  form  of  adaptation  is  the  potential  ability  to  add  a  state,  that  is,  a  hypothesis  representing 
a  new  mode  of  usage.  Naive  Bayes  models  such  as  the  one  described  above  work  well  in  practice 
as  classifiers,  and  are  typically  trained  with  observations  for  which  the  true  class  is  known. 
Dynamic  hypothesis  generation  as  described  here  takes  on  a  more  difficult  problem,  namely,  the 
situation  where  the  data  cases  are  unlabeled  and  even  the  underlying  number  of  hypothesis  states 
is  unknown.  In  this  situation,  it  is  legitimate  to  ask  if  a  system  can  self-organize  to  a  number  of 
hypotheses  that  adequately  separate  the  important  data  classes.  In  this  respect,  the  ability  to 
separate  attack  classes  A  and  B  from  each  other  is  less  important  than  the  ability  to  separate  both 
A  and  B  from  the  set  of  nonattack  classes. 


To  build  this  capability,  we  need  to  enable  the  system  to  add  hypotheses  at  the  root  node  (the 
reader  will  recall  that  the  root  node  state  value  is  not  directly  observable).  As  a  configuration 
option,  the  system  will  create  a  “dummy  state”  at  the  root  node  (or  more  generally,  at  any  node 
that  is  not  directly  observable),  with  an  effective  count  of  1 .  If  this  node  has  children,  a  new  CPT 
row  is  added  at  each  child.  We  use  a  uniform  distribution  over  the  child  state  (each  element  has 
value  )  for  this  CPT  at  present. 


Adding  a  state  then  proceeds  as  follows.  The  inference  mechanism  is  applied  to  an  observation, 
and  a  posterior  belief  is  obtained  for  the  dummy  state  as  if  it  were  a  normal  state.  If  this  state 
“wins”,  it  is  promoted  to  the  valid  state  class  and  the  CPT  rows  for  all  children  are  modified  via 
the  CPT  adjustment  procedure  described  above.  Note  that  since  the  effective  count  of  the  dummy 
state  is  1,  the  adjustment  makes  the  CPT  rows  look  50%  like  the  observation.  Then  a  new 
dummy  state  is  added,  allowing  the  system  to  grow  to  the  number  of  root  node  states  that 
adequately  describe  the  data.  This  dummy  state  is  not  to  be  confused  with  the  OTHER  ATTACK 
hypothesis,  for  which  there  is  an  initial  model  of  nonspecific  anomalous  behavior  (e.g.,  moderate 
error  intensity). 
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There  are  two  ways  to  exploit  the  hypothesis  generation  capability.  In  the  first,  we  initialize  the 
system  with  the  normal  and  attack  hypotheses  described  above,  using  CPTs  derived  from  our 
own  domain  expertise.  We  observe  that  the  system  does  adjust  the  CPTs  somewhat,  but  does  not 
choose  to  add  more  hypotheses  when  running  in  this  fashion.  From  this,  we  tentatively  conclude 
that  no  more  than  12  hypotheses  are  needed  to  classify  these  data. 

Our  next  experiment  examined  the  other  extreme.  We  initialized  the  system  with  a  single  valid 
hypothesis  and  a  dummy  hypothesis  at  the  root  node.  We  then  presented  a  week  of  normal 
(attack-free)  data,  and  the  system  generated  two  valid  states.  As  these  states  were  generated,  the 
CPTs  were  adjusted  according  to  the  procedure  previously  outlined.  We  then  arbitrarily  decided 
that  any  new  states  learned  would  be  reported  as  potential  attacks,  and  presented  data  known  to 
contain  attacks.  The  system  added  two  new  states,  which  captured  the  attacks  seen  previously  by 
the  11 -state  expert-specified  model.  Therefore,  with  the  capabilities  of  adaptation  via 
reinforcement  as  well  as  state  space  expansion  described  above,  it  is  in  fact  possible  to  start  the 
system  with  essentially  no  initial  knowledge.  It  then  organizes  to  an  appropriate  number  of 
hypotheses  and  CPT  values.  It  is  interesting  that  this  system  does  nearly  as  well  at  separating  the 
important  classes  (here,  attack  versus  nonattack)  as  the  expert-specified  model  with  only  four 
root  node  hypothesis  states.  Normal  data  is  adequately  represented  by  two  states,  and  the  variety 
of  attack  data  by  two  abnormal  states.  While  this  does  tend  to  separate  important  normal  and 
attack  classes  into  separate  hypotheses,  explaining  the  result  is  more  difficult.  Nonetheless,  this 
minimal  knowledge  approach  does  remarkably  well,  and  is  a  very  favorable  indicator  of  the 
generalization  potential  of  our  methodology. 

The  learning  procedures  described  above  have  proven  useful  in  our  experimentation,  guiding  us 
both  in  refinement  of  existing  hypotheses  as  well  as  developing  new  hypotheses  for  both  normal 
and  attack  modalities.  However,  we  have  observed  better  operation  if  the  adaptive  capability  is 
disabled,  for  several  reasons.  First,  attacks  and  alert-worthy  events  are  a  very  small  fraction  of 
total  traffic  in  a  real-world  setting,  so  that  learning  an  attack  modality  that  may  be  seen  only  once 
is  problematic.  Second,  we  found  that  the  normal  hypotheses  become  “hardened”  so  as  to  be 
relatively  intolerant  of  erroneous  outcomes.  The  fraction  of  such  outcomes  for  nonmalicious 
reasons  is  too  high  to  be  tolerable  from  an  alert  standpoint,  but  is  too  low  to  permit  sufficient 
“breathing  room”  if  adaptation  is  permitted  indefinitely.  For  the  present,  therefore,  we  run  the 
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system  in  adaptive  mode  to  identify  unanticipated  modalities  and  large  CPT  deviations  from 
what  is  observed  in  true  traffic.  We  then  take  the  results  of  this  phase  and  moderate  it  with  our 
judgment  (sanding  the  corners  off  very  hardened  hypotheses,  so  to  speak)  and  arrive  at  a  batch 
specification  of  the  CPT.  We  then  verify  that  this  new  encoding  remains  sensitive  against 
simulated  datasets  (such  as  the  Lincoln  data).  At  present,  we  detect  the  most  attacks  we  have 
ever  detected  in  the  Lincoln  data,  and  detect  alert-worthy  events  in  our  real-world  data  with  an 
acceptable  level  of  apparent  false  alerts. 

5.  Proposed  Solution 

Bayes  TCP  Sensor  and  Host  Availability  Monitor 

The  probabilistic  methodologies  presented  above  present  an  important  complement  to  heuristic 
and  rule-based  systems  for  both  detection  and  alert  correlation.  We  explored  a  number  of 
variants  in  the  Bayes  TCP  sensor,  such  as  adaptive  CPT  adjustment  dynamic  hypothesis 
generation.  While  these  proved  to  be  interesting  capabilities,  we  must  at  this  point  consider  them 
research  features  and  do  not  make  them  active  by  default  in  the  production  version.  We  instead 
employ  a  system  that  considers  simultaneous  TCP  sessions,  maintains  a  Bayes  hypothesis  that 
classifies  this  session  into  one  of  a  number  of  normal  or  misuse  categories,  transitions  this 
session  state  over  time,  and  evaluates  the  state  as  new  evidence  is  observed. 

We  have  developed  eBayes  as  a  part  of  the  broad  EMERALD  system,  which  permits  us  to 
leverage  from  a  substantial  component  infrastructure.  Specifically,  it  is  an  analytical  component 
that  interfaces  to  the  EMERALD  ETCPGEN  and  EMONTCP  components.  ETCPGEN  can 
process  either  live  TCP  traffic  or  TCPDUMP  data  in  batch  mode.  EMONTCP  extracts  the  TCP 
state  for  a  number  of  generally  simultaneous  TCP  connections.  When  we  refer  to  “events”,  we 
mean  events  from  EMONTCP,  which  already  represents  a  considerable  reduction  from  the  raw 
TCP  data.  There  are  two  components  in  eBayes:  the  session  monitor,  and  the  host  availability 
monitor. 

The  first  of  these  components  analyzes  TCP  sessions,  which  are  imperfectly  described  as 
temporally  contiguous  bursts  of  traffic  from  a  given  client  IP.  We  say  “imperfectly”  because  it  is 
not  very  important  for  the  system  to  demarcate  sessions  exactly.  The  analysis  is  done  by 
Bayesian  inference  at  periodic  intervals  in  a  session,  where  the  interval  is  measured  in  number  of 
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events  (inference  is  always  done  when  the  system  believes  that  the  session  has  ended).  Between 
inference  intervals,  the  system  state  is  propagated  according  to  a  Markov  model. 

The  second  component  discovers  what  services  are  advertised  within  the  domain  monitored  by 
cBayes,  and  then  adapts  to  traffic  intensity  and  connection  failure  rates.  It  continuously  estimates 
belief  in  the  operational  state  of  these  services,  generating  alerts  when  a  service  failure  is 
apparent.  As  such,  it  can  potentially  detect  a  coordinated,  distributed  attack  where  no  session 
appears  sufficiently  anomalous  to  the  session  monitor.  It  can  also  detect  failures  due  to 
nonmalicious  faults. 

The  innovation  provided  by  cBayes  is  that  it  captures  the  best  features  of  signature -based 
intrusion  detection  as  well  as  anomaly  detection  (as  in  EMERALD  eStat).  Like  signature 
engines,  it  can  embody  attack  models,  but  has  the  capability  to  adapt  as  systems  evolve.  Like 
probabilistic  components,  it  has  the  potential  to  generalize  to  previously  unseen  classes  of 
attacks.  In  addition,  the  system  includes  an  adaptive  capability,  which  can  “grow”  quite 
reasonable  models  from  a  random  start.  However,  since  it  has  major  attack  classes  encoded  in  its 
conditional  probability  tables,  it  can  provide  effective  detection  “out  of  the  box”. 

This  system  detects  a  variety  of  scans  and  sweeps  as  well  as  flood  attacks.  It  does  not  examine 
packet  payload,  but  is  limited  to  attacks  that  are  visible  in  the  packet  headers.  The  session  logic 
achieves  alert  threading  in  the  sense  of  aggregating  a  small  number  of  reports  from  attacks  that 
manifest  as  a  large  number  of  raw  events,  which  is  typical  of  floods  and  some  probes. 

Probabilistic  Correlation 

Our  probabilistic  alert  fusion  approach  considers  feature  overlap,  feature  similarity,  minimum 
similarity,  and  expectation  of  similarity.  We  maintain  a  list  of  “meta  alerts”  that  are  possibly 
composed  of  several  alerts,  potentially  from  heterogeneous  sensors.  Lor  two  alerts  (typically  a 
new  alert  and  a  meta  alert),  we  begin  by  identifying  features  they  have  in  common  (feature 
overlap).  Such  features  include  the  source  of  the  attack,  the  target  (hosts  and  ports),  the  class  of 
the  attack,  and  time  information.  With  each  feature,  we  have  a  similarity  function  that  returns  a 
number  between  0  and  1 ,  with  1  corresponding  to  a  perfect  match. 

Expectation  of  similarity  is  also  a  number  between  0  and  1 ,  and  expresses  our  prior  expectations 
that  the  feature  should  match  if  the  two  alerts  are  related,  considering  the  specifics  of  each.  We 
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can  consider  expectation  of  similarity  as  a  feature  weighting  that  ean  vary  based  on  the  type  of 
eorrelation  being  performed. 

If  an  alert  from  a  sensor  has  a  thread  identifier  that  matches  the  list  of  sensor/thread  identifiers 
for  some  meta  alert,  the  alert  is  considered  a  match  and  fusion  is  done  immediately.  In  other 
words,  the  individual  sensor’s  determination  that  an  alert  is  an  update  of  or  otherwise  related  to 
one  of  its  own  alerts  overrides  other  eonsiderations  of  alert  similarity. 

If  the  meta  alert  has  received  reports  from  host  sensors  on  different  hosts,  we  do  not  expect  the 
target  host  feature  to  mateh.  If  at  least  one  report  from  a  network  sensor  has  eontributed  to  the 
meta  alert  and  a  host  sensor  alert  is  reeeived,  the  expeetation  of  similarity  is  that  the  target 
address  of  the  latter  is  contained  in  the  target  list  of  the  former. 

In  determining  whether  an  exploit  can  be  plausibly  eonsidered  the  next  stage  of  an  attack  for 
which  a  probe  was  observed,  we  expect  the  target  of  the  exploit  (the  features  host  and  port)  to  be 
eontained  in  the  target  host  and  port  list  of  the  meta  alert. 

Some  sensors,  particularly  those  that  maintain  a  degree  of  state,  report  start  and  end  times  for  an 
attack,  while  others  can  timestamp  only  a  given  alert.  The  former  deal  with  time  intervals,  while 
the  latter  do  not.  Similarity  in  time  comprehends  overlap  of  the  time  intervals  in  the  alerts 
considered  for  correlation,  as  well  as  the  notion  of  precedenee.  We  do  not  penalize  time 
similarity  too  far  from  unity  if  the  time  difference  is  plausibly  due  to  clock  drift. 

Deciding  whether  the  attacker  is  similar  is  somewhat  more  involved.  In  the  case  of  an  exact 
match  of  originating  IP  address,  similarity  is  perfect.  We  assign  high  similarity  if  the  subnet 
appears  to  match.  In  this  way,  a  meta  alert  may  potentially  eontain  a  list  of  attaeker  addresses.  At 
this  point,  we  eonsider  similarity  based  on  containment.  In  addition,  if  an  attacker  compromises  a 
host  within  our  network  (as  inferred  by  a  suceessful  outcome  for  an  attack  of  the  root 
eompromise  class),  that  host  is  added  to  the  list  of  attacker  hosts  for  the  meta  alert  in  question. 
Finally,  for  attack  classes  where  the  attacker’s  address  is  likely  to  be  spoofed  (for  example,  the 
Neptune  attack),  similarity  expectation  with  respect  to  attacker  address  is  assigned  a  low  value. 

Our  correlation  component  also  enforces  situation-speeific  minimum  similarity.  Certain  features 
can  be  required  to  match  exactly  (minimum  similarity  for  these  is  unity)  or  approximately 
(minimum  similarity  is  less  than  unity,  but  strictly  positive)  for  an  alert  to  be  considered  as  a 
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candidate  for  fusion  with  another.  Minimum  expeetation  thus  expresses  neeessary  but  not 
suffieient  eonditions  for  eorrelation. 

The  overall  similarity  between  two  alerts  is  zero  if  any  overlapping  feature  matehes  at  a  value 
less  than  the  minimum  similarity  for  the  feature  (features  for  whieh  no  minimum  similarity  is 
speeified  are  treated  as  having  a  minimum  similarity  of  0).  Otherwise,  overall  similarity  is  the 
weighted  average  of  the  similarities  of  the  overlapping  features,  using  the  respeetive  expeetations 
of  similarity  as  weights. 

By  appropriate  settings  of  similarity  expeetation  and  minimum  similarity,  the  eorrelation 
eomponent  aehieves  a  hierarehy  of  eorrelation  into  threads,  ineidents,  and  seenarios.  The  system 
is  eomposable  in  that  we  ean  deploy  multiple  instanees  to  obtain  eorrelation  at  different  stages  in 
the  hierarehy.  For  example,  we  ean  infer  threads  (within  sensor  eorrelation)  and  then  eorrelate 
threaded  alerts  from  heterogeneous  sensors  into  seeurity  ineidents.  The  eorrelation  eomponent 
ean  funetion  in  thread,  ineident,  and  seenario  modes,  and  the  modes  may  be  run  eoneurrently. 


6.  RESULTS  AND  DISCUSSION 
Bayes  TCP  Sensor 

Lincoln  Laboratory  1999  Evaluation  Study 

We  have  run  our  model  against  the  TCP  dump  data  from  the  1999  Lineoln  Laboratory  IDEVAL 
data  sets  [LipOO].  It  is  highly  effeetive  against  floods  and  nonstealthy  probe  attaeks,  and 
moderately  effeetive  against  stealthy  probe  attaeks. 

This  data  simulates  aetivity  at  a  medium-size  LAN  with  typieal  firewalls  and  gateways.  Traffie 
generators  simulate  typieal  volume  and  variety  of  baekground  traffic,  both  intra-LAN  and  across 
the  gateway.  Attaek  seripts  of  known  types  are  exeeuted  at  known  times,  and  the  traffie  (a  mix  of 
normal  baekground  as  well  as  attaek)  is  eolleeted  by  standard  utilities,  sueh  as  TCPDUMP. 

For  this  prototype  we  examined  external-to-internal  traffie  using  the  TCP/IP  protoeol.  This 
means  that  eonsole  attaeks,  insider  attaeks,  and  attaeks  exploiting  other  protoeols  sueh  as  IDP 
and  UDP  are  invisible.  These  are  not  theoretical  limitations,  and  we  intend  to  inelude  the  UDP 
protoeol  in  the  near  future.  However,  this  did  limit  attacks  that  were  visible  to  the  system.  The 
fourth  week  of  the  data  set  was  considered  the  most  diffieult,  as  it  eontained  the  most  stealthy 
attaeks.  We  deteeted  three  visible  portsweeps  and  missed  one  that  aeeessed  three  ports  over  four 
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minutes  with  no  errors.  All  of  the  portsweeps  in  this  data  set  are  stealthy  by  the  standards  of  the 
Lincoln  training  data  and  the  week  5  data  (we  detect  100%  of  visible,  nonstealthy  sweeps).  A 
Satan  attack  and  a  TCPRESET  attack  are  also  detected  as  portsweeps.  This  particular  Satan 
attack  was  run  in  a  mode  where  it  in  fact  is  characteristic  of  a  portsweep.  For  the  TCPRESET, 
the  portsweep  hypothesis  slightly  edges  out  the  OTHER  hypothesis.  Other  detected  attacks  in 
this  data  include  MAIEBOMB  and  PROCESS  TABEE  (both  100%  detected)  as  well  as  three 
password-guessing  attacks  (one  detected  as  OTHER,  two  as  DICTIONARY).  The  latter  three 
detections  demonstrate  the  power  of  the  approach.  They  were  not  in  the  set  of  attacks  that 
Eincoln  thought  should  be  detected  by  this  sensor,  so  we  initially  considered  them  false  alarms. 
Further  review  of  the  full  attack  list  indicated  that  they  were  in  fact  good  detections,  even  though 
at  that  time  we  had  no  DICTIONARY  hypothesis  and  they  were  called  OTHER.  By  elucidating 
characteristics  of  these  attacks,  we  added  the  DICTIONARY  hypothesis  (indicative  of  password 
guessing),  which  now  captures  two  of  these  attacks  and  is  a  close  second  to  OTHER  as  a 
classification  for  the  third. 

Real-World  Experience 

The  Bayes  TCP  component  runs  on  our  own  TCP  gateway,  and  it  has  proved  to  be  stable  for 
indefinite  periods  of  time.  The  TCP  event  generator,  EMONTCP,  and  Bayes  inference 
components  require  about  15MB  on  a  Free  BSD  platform,  and  never  use  more  than  a  few  percent 
of  the  CPU.  For  real-world  traffic,  we  of  course  have  no  ground  truth,  but  the  results  have 
nonetheless  proved  interesting  to  us  in  the  sense  of  scientific  experimentation,  as  well  as  being  of 
practical  interest  to  our  system  administrators. 

Our  initial  observation  was  that,  not  surprisingly,  real-world  data  contains  many  failure  modes 
not  seen  in  a  set  such  as  the  IDEVAL  data  described  above.  For  example,  we  regularly  observe  a 
pattern  of  http  sessions  of  moderate  or  long  duration  in  which  a  significant  number  of 
connections  terminate  abnormally,  but  on  such  a  time  scale  and  in  such  modes  that  we  are  fairly 
certain  they  are  not  malicious.  To  capture  these  sessions,  we  decided  to  add  the  HTTP  F 
hypothesis  (for  failed  http).  This  reduced  the  alert  volume  to  a  manageable  15  or  so  per  day.  A 
representative  two-week  period  comprised  about  470,000  connection  events,  grouped  by  the 
session  model  into  about  60,000  sessions  of  which  222  produced  alerts.  It  is  important  to  point 
out  that  many  of  these  are  almost  certainly  attacks,  consisting  of  IP  and  probe  sweeps  and  some 
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attempted  denials  of  service.  Some  of  the  false  alert  mechanisms  are  understood  and  we  are 
actively  working  to  improve  system  response  to  these  without  being  too  specific  (for  example, 
ignoring  alerts  involving  port  113  requests,  which  are  screened  in  our  environment  but  will  be 
seen  from  normal  mail  clients). 

Probabilistic  Correlator 

Live  Traffic 

The  following  is  an  example  of  alert  correlation  over  time,  in  this  case  correlating  alerts  that  are 
components  of  a  stealthy  port  sweep.  The  following  example  is  one  of  the  contributing  alerts.  In 
the  interest  of  space,  we  do  not  include  all  the  content  that  is  in  the  alert  and  meta  alert  templates, 
but  limit  ourselves  to  the  fields  needed  to  illustrate  the  result. 


Thread  ID  69156  Class=  portsweep  BEL  (class)  =  0.994  BEL (attack) =  1.000 

2001-06-15  17:34:35  from  xx.yyy . 148 . 33  ports  1064  to  1066  duration=  0.000 
dest  IP  aaa.bbb. 30 . 117 
3  dest  ports:  12345(2}  27374(3}  139 


This  is  a  probe  for  three  vulnerable  ports  on  a  single  IP  address  in  the  protected  network,  and  is 
detected  by  the  Bayes  TCP  sensor.  The  example  above  is  just  a  single  step  in  a  probe  that 
apparently  transpired  over  several  days,  and  resulted  in  the  following  correlated  meta  alert. 


Meta  Alert  Thread  248 

Source  IPs  source_IParray :  xx.yyy . 148 . 33  xx.yyy . 148 . 47 

Target  IPs  target_IParray :  aaa.bbb. 30 . 117  aaa . bbb . 6 . 232  aaa.bbb. 8. 31 
aaa . bbb . 1 . 1 6 6  aaa . bbb . 7 . 1 1 8  aaa . bbb . 2 8 . 83  aaa . bbb . 1 9 . 12 1  aaa . bbb . 2 1 . 130 
aaa . bbb . 6 . 1 94  aaa.bbb. 1 . 114  aaa.bbb. 16 . 150 

From  2001-06-15  17:34:35  to  2001-06-21  09:19:57 
correlated_alert_priority  -1 

Ports  target_TCP_portarray :  12345(4}  27374(4}  139(3} 

Number  of  threads  10  Threads  :69156  71090  76696  84793  86412  87214  119525 
124933  125331  126201 
Fused:  PORT_SCAN 


We  note  that  we  have  correlated  events  from  two  source  addresses  that  were  judged  to  be 
sufficiently  similar.  The  attack  is  quite  stealthy,  consisting  of  a  small  number  of  attempted 
connections  to  single  target  hosts  over  a  period  of  days.  The  list  of  thread  identifiers  permits  the 
administrator  to  examine  any  of  the  stages  in  the  attack.  In  this  case,  each  attack  stage  is 
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considered  a  portsweep;  if  the  stages  consisted  of  different  attack  classes,  these  would  be  listed 
under  “Attack  steps”.  Over  the  three-week  time  period  containing  this  attack,  the  IDS  sensor 
processed  more  than  200,000  sessions  and  generated  4439  alerts.  The  probabilistic  correlation 
system  produeed  604  meta  alerts. 

In  a  more  reeent  live  traffie  analysis  experiment,  we  considered  alerts  from  several  EMERALD 
sensors  as  well  as  SNORT,  operating  in  August  2001  during  the  height  of  the  Code  Red  and 
Code  Red  II  attacks.  The  probabilistic  correlation  engine  considers  attack  class  as  one  of  the 
features  in  its  similarity  matching  algorithms.  The  mapping  between  attack  classes  and  attaek 
signatures  is  implemented  in  the  EMERALD  incident  handling  knowledge  base  (IHKB),  whieh 
is  shared  by  all  EMERALD  sensor  and  correlation  components.  Currently,  all  signatures  are 
mapped  into  the  14  IHKB  classes  listed  in  Table  2. 


ACCESS  VIOLATION 


DENIAL  OF  SERVICE 


SUSPICIOUS  USAGE 


ACTION  LOGGED 


INTEGRITY  VIOLATION  SYSTEM  ENVIRONMENT  CORRUPTION 


ASSET  DISTRESS 


INVALID 


USER  ENVIRONMENT  CORRUPTION 


BINARY  SUBVERSION 


PRIVILEGE  VIOLATION  USER  SUBVERSION 


CONNECTION  VIOLATION  PROBE 


Table  2:  EMERALD  IHKB  Incident  Classes 


Due  to  time  eonstraints,  we  were  not  able  to  populate  the  mapping  of  SNORT  alerts  to 
EMERALD  ineident  elasses,  so  SNORT  alerts  are  assigned  to  a  fallback  “ACTION  LOGGED” 
class.  An  advantage  of  probabilistic  techniques  is  that  this  approaeh  produces  slightly  lower 
fidelity  results,  but  the  technique  is  suffieiently  robust  to  tolerate  this  as  a  minor  defieiency. 


Beeause  of  the  overall  arehitecture  of  EMERALD,  we  are  able  to  deploy  a  correlation  capability 
at  one  or  more  points  in  a  monitoring  network,  and  ean  in  faet  correlate  eorrelated  alerts.  We 
chose  to  separately  correlate  the  SNORT  alerts,  the  EMERALD  alerts,  and  the  entire  set.  The 
first  function  of  correlation,  as  presented  in  the  introduction,  is  to  reduce  the  raw  number  of  alert 
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reports  that  a  security  administrator  must  examine.  Table  3  reflects  totals  for  a  one-day  collection 
period  in  our  laboratory,  starting  at  10  a.m.  PDT,  August  6-7,  2001. 


Sensor 

Raw  Alerts 

Correlated  Alerts 

Snort 

4816 

487 

EMERALD 

1586 

523 

Composite 

6402 

869 

Table  3:  Heterogeneous  Sensor  Correlation  Live  Traffic  Results 


As  described  above,  it  is  possible  that  a  raw  alert  will  fail  to  correlate  with  any  other  alerts.  In 
this  case,  the  corresponding  correlated  alert  will  consist  solely  of  the  contents  of  the  single 
contributing  raw  alert.  Therefore,  the  set  of  correlated  alerts  contains  information  for  all  of  the 
raw  alerts.  We  observe  that  correlation  achieves  about  a  10  to  1  reduction  in  SNORT  alerts,  and 
about  3  to  1  for  EMERALD  alerts.  This  occurs  because  the  EMERALD  sensors  attempt  to  thread 
alerts,  as  we  have  previously  discussed. 

Cyberpanel  Grand  Challenge  Problem 

The  Cyberpanel  Grand  Challenge  Problem  (GCP)  was  formulated  to  facilitate  experimentation 
with  alert  correlation  systems.  The  goal  was  to  present  to  correlation  systems  a  set  of  alerts  that 
were  realistic  in  the  sense  of  the  volume  and  nature  of  alerts,  containing  many  nuisance  attacks 
and  one  critical  attack  scenario.  The  objective  of  the  developer  of  a  correlation  methodology  was 
to  correlate  the  nuisance  and  critical  alerts,  thereby  reducing  total  alert  volume  to  a  more 
manageable  level,  and  to  identify  alerts  related  to  the  critical  attack  as  representing  something 
more  serious  than  the  background  nuisance  traffic.  It  is  also  crucial  that  the  alerts  from  the 
critical  attack  not  spuriously  correlate  with  alerts  from  the  nuisance  attacks.  To  prioritize  alerts, 
we  activated  the  alert  prioritization  functionality,  which  is  based  on  the  same  Bayes  inference 
library  as  the  TCP  sensor  and  is  shared  by  the  EMERALD  MCorrelator  as  well. 

Table  4  summarizes  alert  reduction  results.  The  “truth”  files  contain  alerts  representing  the 
critical  attacks,  while  the  “all”  files  contain  the  same  critical  attack  alerts  and  a  large  number  of 
nuisance  attack  alerts. 
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Attack  1 

Attack  2 

Truth 

Total 

117 

8 

Correlated 

13 

3 

All 

Total 

7216 

7634 

Correlated 

474 

475 

Table  4:  Grand  Challenge  Problem  Correlation  Results 

As  desired,  the  alerts  representing  the  critical  attacks  did  not  correlate  with  the  other  alerts. 
Moreover,  the  correlated  alerts  representing  critical  attack  scenarios  were  assigned  priorities  of 
240  (on  a  0  to  255  scale)  while  other  alerts  were  scored  below  127. 

7.  Conclusions  and  Suggestions  for  Future  Work 

We  have  developed  components  for  intrusion  detection  and  intrusion  report  correlation  that  use 
probabilistic  techniques  rather  than  the  more  common  signature  and  heuristic  approaches  used  in 
the  field.  We  believe  that  they  are  not  a  replacement  for  the  latter  approaches,  but  do  provide 
important  complementary  capabilities  in  the  areas  of  generalization  potential,  adaptability  to 
changing  conditions,  and  robustness  against  improperly  formed  or  conflicting  messages. 

The  intrusion  detection  components  consist  of  a  TCP  session  monitor  and  a  closely  coupled  host 
availability  monitor,  both  based  on  Bayes  inference.  The  former  detects  a  variety  of  attacks 
visible  in  TCP  packet  headers,  while  the  latter  discovers  new  network  hosts  and  services  and 
detects  failures  (malicious  or  not).  By  coupling  these  sensors,  the  sensitivity  and  false  alarm  rate 
of  the  overall  system  are  greatly  improved. 

The  probabilistic  correlation  component  adapts  concepts  from  multisensor  data  fusion  and 
introduces  innovative  similarity  functions  suitable  to  the  IDS  alert  correlation  domain.  It  includes 
a  Bayes  subsystem  that  reproduces  the  priority  assignment  that  an  expert  security  administrator 
would  give  to  a  set  of  alerts.  The  probabilistic  approach  is  robust  in  the  heterogeneous  sensor 
environment,  where  sensors  may  not  agree  about  the  particulars  of  a  given  attack,  and  some 
sensors  may  implement  alert  interchange  standards  incompletely. 


29 


We  have  extensive  experimental  and  live  experience  with  both  systems.  Also,  these  systems 
operate  against  live  traffic  at  an  Internet  gateway  for  the  NS  A.  Their  experience  and  our  own  has 
enabled  continuous  refinement  of  the  components,  so  that  they  now  achieve  impressive  results 
with  high  stability. 

In  terms  of  future  work,  we  would  like  to  explore  the  scenario  of  cross-domain  correlation.  This 
is  motivated  somewhat  by  the  grand  challenge  problem,  and  addresses  the  issue  of  a 
simultaneous  attack  against  multiple  autonomous  but  cooperating  domains.  The  scenario  is 
appropriate  to  a  distributed  command  mission,  as  well  as  potentially  to  civilian  infrastructure  and 
homeland  defense. 

We  would  also  like  to  explore  synergies  between  our  correlation  work  and  the  Correlated  Attack 
Modeling  (CAM)  effort.  Specifically,  we  would  represent  CAM  models  as  a  special  class  of 
meta  alert  that  is  essentially  a  template,  with  appropriate  wildcards  for  feature  matching.  These 
would  form  a  special  set  of  “seed”  alerts  in  the  meta  alert  list.  A  set  of  alerts  that  match  the  seed 
alert  then  initiate  a  correlated  alert  of  the  corresponding  correlated  attack  type. 

We  are  actively  pursuing  opportunities  to  transition  this  technology  into  the  OPX  Analyst  Work 
Bench  (AWB),  as  well  as  to  the  U.  S.  Army  and  the  Federal  Aviation  Administration. 
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